perspective transformation
Pic2Diagnosis: A Method for Diagnosis of Cardiovascular Diseases from the Printed ECG Pictures
Büyüksolak, Oğuzhan, Öksüz, İlkay
The electrocardiogram (ECG) is a vital tool for diagnosing heart diseases. However, many disease patterns are derived from outdated datasets and traditional stepwise algorithms with limited accuracy. This study presents a method for direct cardiovascular disease (CVD) diagnosis from ECG images, eliminating the need for digitization. The proposed approach utilizes a two-step curriculum learning framework, beginning with the pre-training of a classification model on segmentation masks, followed by fine-tuning on grayscale, inverted ECG images. Robustness is further enhanced through an ensemble of three models with averaged outputs, achieving an AUC of 0.9534 and an F1 score of 0.7801 on the BHF ECG Challenge dataset, outperforming individual models. By effectively handling real-world artifacts and simplifying the diagnostic process, this method offers a reliable solution for automated CVD diagnosis, particularly in resource-limited settings where printed or scanned ECG images are commonly used. Such an automated procedure enables rapid and accurate diagnosis, which is critical for timely intervention in CVD cases that often demand urgent care.
Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders
Dakic, Kosta, Thilakarathna, Kanchana, Calheiros, Rodrigo N., Lim, Teng Joon
Multiview systems have become a key technology in modern computer vision, offering advanced capabilities in scene understanding and analysis. However, these systems face critical challenges in bandwidth limitations and computational constraints, particularly for resource-limited camera nodes like drones. This paper presents a novel approach for communication-efficient distributed multiview detection and tracking using masked autoencoders (MAEs). We introduce a semantic-guided masking strategy that leverages pre-trained segmentation models and a tunable power function to prioritize informative image regions. This approach, combined with an MAE, reduces communication overhead while preserving essential visual information. We evaluate our method on both virtual and real-world multiview datasets, demonstrating comparable performance in terms of detection and tracking performance metrics compared to state-of-the-art techniques, even at high masking ratios. Our selective masking algorithm outperforms random masking, maintaining higher accuracy and precision as the masking ratio increases. Furthermore, our approach achieves a significant reduction in transmission data volume compared to baseline methods, thereby balancing multiview tracking performance with communication efficiency.
Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision
Understanding the 3D world is a fundamental problem in computer vision. However, learning a good representation of 3D objects is still an open problem due to the high dimensionality of the data and many factors of variation involved. In this work, we investigate the task of single-view 3D object reconstruction from a learning agent's perspective. We formulate the learning process as an interaction between 3D and 2D representations and propose an encoder-decoder network with a novel projection loss defined by the perspective transformation. More importantly, the projection loss enables the unsupervised learning using 2D observation without explicit 3D supervision. We demonstrate the ability of the model in generating 3D volume from a single 2D image with three sets of experiments: (1) learning from single-class objects; (2) learning from multi-class objects and (3) testing on novel object classes. Results show superior performance and better generalization ability for 3D object reconstruction when the projection loss is involved.
Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation
Katiyar, Sulabh, Borgohain, Samir Kumar
Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional Neural Network as encoder to extract features from images, Hierarchical Context based Word Embeddings for word representations and a Deep Stacked Long Short Term Memory network as decoder, in addition to using Image Data Augmentation to avoid over-fitting. For data Augmentation, we use Horizontal and Vertical Flipping in addition to Perspective Transformations on the images. We evaluate our proposed methods with two image captioning frameworks- Encoder-Decoder and Soft Attention. Evaluation on widely used metrics have shown that our approach leads to considerable improvement in model performance.
Data Set and Data Augmentation for Face Detection and Recognition
When it comes to building an Artificially Intelligent (AI) application, your approach must be data first, not application first. Dependencies on data cost more than software dependencies, but are constantly overlooked. To build a face detection and/or face recognition model it's important to know available data set and data augmentation approaches to be followed for training the model. Note that there's difference between Face Identification and Face Recognition. It is process of comparing face image with claimed identity, basically it is a "One-to-one matching".
Lane Detection with Deep Learning (Part 2) – Towards Data Science
This is part two of my deep learning solution for lane detection, which covers the actual models I created in finding my final approach to the problem, as well as some potential improvements. Be sure to read Part One for the limitations of my previous approaches as well as the preliminary data used prior to the changes I made below. The code and data mentioned here and in the earlier post can be found in my Github repo. With a decent dataset created, I was ready to make my first model for using deep learning to detect lane lines. You may be asking, "Wait, I thought you were trying to get rid of perspective transformation?"
Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision
Yan, Xinchen, Yang, Jimei, Yumer, Ersin, Guo, Yijie, Lee, Honglak
Understanding the 3D world is a fundamental problem in computer vision. However, learninga good representation of 3D objects is still an open problem due to the high dimensionality of the data and many factors of variation involved. In this work, we investigate the task of single-view 3D object reconstruction from a learning agent's perspective. We formulate the learning process as an interaction between 3D and 2D representations and propose an encoder-decoder network with a novel projection loss defined by the perspective transformation. More importantly, the projection loss enables the unsupervised learning using 2D observation without explicit 3D supervision. We demonstrate the ability of the model in generating 3D volume from a single 2D image with three sets of experiments: (1) learning from single-class objects; (2) learning from multi-class objects and (3) testing on novel object classes. Results show superior performance and better generalization ability for 3D object reconstruction when the projection loss is involved.